Pesquisa | Portal Regional da BVS

A supervised learning method for classifying methylation disorders.

Walsh, Jesse R; Sun, Guangchao; Balan, Jagadheshwar; Hardcastle, Jayson; Vollenweider, Jason; Jerde, Calvin; Rumilla, Kandelaria; Koellner, Christy; Koleilat, Alaa; Hasadsri, Linda; Kipp, Benjamin; Jenkinson, Garrett; Klee, Eric.

BMC Bioinformatics ; 25(1): 66, 2024 Feb 12.

Artigo em Inglês | MEDLINE | ID: mdl-38347515

RESUMO

BACKGROUND: DNA methylation is one of the most stable and well-characterized epigenetic alterations in humans. Accordingly, it has already found clinical utility as a molecular biomarker in a variety of disease contexts. Existing methods for clinical diagnosis of methylation-related disorders focus on outlier detection in a small number of CpG sites using standardized cutoffs which differentiate healthy from abnormal methylation levels. The standardized cutoff values used in these methods do not take into account methylation patterns which are known to differ between the sexes and with age. RESULTS: Here we profile genome-wide DNA methylation from blood samples drawn from within a cohort composed of healthy controls of different age and sex alongside patients with Prader-Willi syndrome (PWS), Beckwith-Wiedemann syndrome, Fragile-X syndrome, Angelman syndrome, and Silver-Russell syndrome. We propose a Generalized Additive Model to perform age and sex adjusted outlier analysis of around 700,000 CpG sites throughout the human genome. Utilizing z-scores among the cohort for each site, we deployed an ensemble based machine learning pipeline and achieved a combined prediction accuracy of 0.96 (Binomial 95% Confidence Interval 0.868[Formula: see text]0.995). CONCLUSION: We demonstrate a method for age and sex adjusted outlier detection of differentially methylated loci based on a large cohort of healthy individuals. We present a custom machine learning pipeline utilizing this outlier analysis to classify samples for potential methylation associated congenital disorders. These methods are able to achieve high accuracy when used with machine learning methods to classify abnormal methylation patterns.

Assuntos

Síndrome de Beckwith-Wiedemann , Síndrome de Silver-Russell , Humanos , Impressão Genômica , Metilação de DNA , Síndrome de Beckwith-Wiedemann/diagnóstico , Síndrome de Beckwith-Wiedemann/genética , Síndrome de Silver-Russell/diagnóstico , Síndrome de Silver-Russell/genética , Aprendizado de Máquina Supervisionado

Multiomics single timepoint measurements to predict severe COVID-19 - Authors' reply.

Garapati, Kishore; Byeon, Seul Kee; Walsh, Jesse R; Jenkinson, Garrett; Cattaneo, Roberto; O'Horo, John C; Badley, Andrew D; Pandey, Akhilesh.

Lancet Digit Health ; 5(2): e57, 2023 02.

Artigo em Inglês | MEDLINE | ID: mdl-36707188

Assuntos

COVID-19 , Humanos , Multiômica , SARS-CoV-2

Development of a multiomics model for identification of predictive biomarkers for COVID-19 severity: a retrospective cohort study.

Byeon, Seul Kee; Madugundu, Anil K; Garapati, Kishore; Ramarajan, Madan Gopal; Saraswat, Mayank; Kumar-M, Praveen; Hughes, Travis; Shah, Rameen; Patnaik, Mrinal M; Chia, Nicholas; Ashrafzadeh-Kian, Susan; Yao, Joseph D; Pritt, Bobbi S; Cattaneo, Roberto; Salama, Mohamed E; Zenka, Roman M; Kipp, Benjamin R; Grebe, Stefan K G; Singh, Ravinder J; Sadighi Akha, Amir A; Algeciras-Schimnich, Alicia; Dasari, Surendra; Olson, Janet E; Walsh, Jesse R; Venkatakrishnan, A J; Jenkinson, Garrett; O'Horo, John C; Badley, Andrew D; Pandey, Akhilesh.

Lancet Digit Health ; 4(9): e632-e645, 2022 09.

Artigo em Inglês | MEDLINE | ID: mdl-35835712

RESUMO

BACKGROUND: COVID-19 is a multi-system disorder with high variability in clinical outcomes among patients who are admitted to hospital. Although some cytokines such as interleukin (IL)-6 are believed to be associated with severity, there are no early biomarkers that can reliably predict patients who are more likely to have adverse outcomes. Thus, it is crucial to discover predictive markers of serious complications. METHODS: In this retrospective cohort study, we analysed samples from 455 participants with COVID-19 who had had a positive SARS-CoV-2 RT-PCR result between April 14, 2020, and Dec 1, 2020 and who had visited one of three Mayo Clinic sites in the USA (Minnesota, Arizona, or Florida) in the same period. These participants were assigned to three subgroups depending on disease severity as defined by the WHO ordinal scale of clinical improvement (outpatient, severe, or critical). Our control cohort comprised of 182 anonymised age-matched and sex-matched plasma samples that were available from the Mayo Clinic Biorepository and banked before the COVID-19 pandemic. We did a deep profiling of circulatory cytokines and other proteins, lipids, and metabolites from both cohorts. Most patient samples were collected before, or around the time of, hospital admission, representing ideal samples for predictive biomarker discovery. We used proximity extension assays to quantify cytokines and circulatory proteins and tandem mass spectrometry to measure lipids and metabolites. Biomarker discovery was done by applying an AutoGluon-tabular classifier to a multiomics dataset, producing a stacked ensemble of cutting-edge machine learning algorithms. Global proteomics and glycoproteomics on a subset of patient samples with matched pre-COVID-19 plasma samples was also done. FINDINGS: We quantified 1463 cytokines and circulatory proteins, along with 902 lipids and 1018 metabolites. By developing a machine-learning-based prediction model, a set of 102 biomarkers, which predicted severe and clinical COVID-19 outcomes better than the traditional set of cytokines, were discovered. These predictive biomarkers included several novel cytokines and other proteins, lipids, and metabolites. For example, altered amounts of C-type lectin domain family 6 member A (CLEC6A), ether phosphatidylethanolamine (P-18:1/18:1), and 2-hydroxydecanoate, as reported here, have not previously been associated with severity in COVID-19. Patient samples with matched pre-COVID-19 plasma samples showed similar trends in muti-omics signatures along with differences in glycoproteomics profile. INTERPRETATION: A multiomic molecular signature in the plasma of patients with COVID-19 before being admitted to hospital can be exploited to predict a more severe course of disease. Machine learning approaches can be applied to highly complex and multidimensional profiling data to reveal novel signatures of clinical use. The absence of validation in an independent cohort remains a major limitation of the study. FUNDING: Eric and Wendy Schmidt.

Assuntos

COVID-19 , Biomarcadores , COVID-19/diagnóstico , Estudos de Coortes , Citocinas , Humanos , Lipidômica/métodos , Lipídeos , Metabolômica/métodos , Pandemias , Prognóstico , Proteômica/métodos , Estudos Retrospectivos , SARS-CoV-2

COVID-19 Mortality Prediction From Deep Learning in a Large Multistate Electronic Health Record and Laboratory Information System Data Set: Algorithm Development and Validation.

Sankaranarayanan, Saranya; Balan, Jagadheshwar; Walsh, Jesse R; Wu, Yanhong; Minnich, Sara; Piazza, Amy; Osborne, Collin; Oliver, Gavin R; Lesko, Jessica; Bates, Kathy L; Khezeli, Kia; Block, Darci R; DiGuardo, Margaret; Kreuter, Justin; O'Horo, John C; Kalantari, John; Klee, Eric W; Salama, Mohamed E; Kipp, Benjamin; Morice, William G; Jenkinson, Garrett.

J Med Internet Res ; 23(9): e30157, 2021 09 28.

Artigo em Inglês | MEDLINE | ID: mdl-34449401

RESUMO

BACKGROUND: COVID-19 is caused by the SARS-CoV-2 virus and has strikingly heterogeneous clinical manifestations, with most individuals contracting mild disease but a substantial minority experiencing fulminant cardiopulmonary symptoms or death. The clinical covariates and the laboratory tests performed on a patient provide robust statistics to guide clinical treatment. Deep learning approaches on a data set of this nature enable patient stratification and provide methods to guide clinical treatment. OBJECTIVE: Here, we report on the development and prospective validation of a state-of-the-art machine learning model to provide mortality prediction shortly after confirmation of SARS-CoV-2 infection in the Mayo Clinic patient population. METHODS: We retrospectively constructed one of the largest reported and most geographically diverse laboratory information system and electronic health record of COVID-19 data sets in the published literature, which included 11,807 patients residing in 41 states of the United States of America and treated at medical sites across 5 states in 3 time zones. Traditional machine learning models were evaluated independently as well as in a stacked learner approach by using AutoGluon, and various recurrent neural network architectures were considered. The traditional machine learning models were implemented using the AutoGluon-Tabular framework, whereas the recurrent neural networks utilized the TensorFlow Keras framework. We trained these models to operate solely using routine laboratory measurements and clinical covariates available within 72 hours of a patient's first positive COVID-19 nucleic acid test result. RESULTS: The GRU-D recurrent neural network achieved peak cross-validation performance with 0.938 (SE 0.004) as the area under the receiver operating characteristic (AUROC) curve. This model retained strong performance by reducing the follow-up time to 12 hours (0.916 [SE 0.005] AUROC), and the leave-one-out feature importance analysis indicated that the most independently valuable features were age, Charlson comorbidity index, minimum oxygen saturation, fibrinogen level, and serum iron level. In the prospective testing cohort, this model provided an AUROC of 0.901 and a statistically significant difference in survival (P<.001, hazard ratio for those predicted to survive, 95% CI 0.043-0.106). CONCLUSIONS: Our deep learning approach using GRU-D provides an alert system to flag mortality for COVID-19-positive patients by using clinical covariates and laboratory values within a 72-hour window after the first positive nucleic acid test result.

Assuntos

COVID-19 , Sistemas de Informação em Laboratório Clínico , Aprendizado Profundo , Algoritmos , Registros Eletrônicos de Saúde , Humanos , Estudos Retrospectivos , SARS-CoV-2

Tissue-specific gene expression and protein abundance patterns are associated with fractionation bias in maize.

Walsh, Jesse R; Woodhouse, Margaret R; Andorf, Carson M; Sen, Taner Z.

BMC Plant Biol ; 20(1): 4, 2020 Jan 03.

Artigo em Inglês | MEDLINE | ID: mdl-31900107

RESUMO

BACKGROUND: Maize experienced a whole-genome duplication event approximately 5 to 12 million years ago. Because this event occurred after speciation from sorghum, the pre-duplication subgenomes can be partially reconstructed by mapping syntenic regions to the sorghum chromosomes. During evolution, maize has had uneven gene loss between each ancient subgenome. Fractionation and divergence between these genomes continue today, constantly changing genetic make-up and phenotypes and influencing agronomic traits. RESULTS: Here we regenerate the subgenome reconstructions for the most recent maize reference genome assembly. Based on both expression and abundance data for homeologous gene pairs across multiple tissues, we observed functional divergence of genes across subgenomes. Although the genes in the larger maize subgenome are often expressing more highly than their homeologs in the smaller subgenome, we observed cases where homeolog expression dominance switches in different tissues. We demonstrate for the first time that protein abundances are higher in the larger subgenome, but they also show tissue-specific dominance, a pattern similar to RNA expression dominance. We also find that pollen expression is uniquely decoupled from protein abundance. CONCLUSION: Our study shows that the larger subgenome has a greater range of functional assignments and that there is a relative lack of overlap between the subgenomes in terms of gene functions than would be suggested by similar patterns of gene expression and protein abundance. Our study also revealed that some reactions are catalyzed uniquely by the larger and smaller subgenomes. The tissue-specific, nonequivalent expression-level dominance pattern observed here implies a change in regulatory control which favors differentiated selective pressure on the retained duplicates leading to eventual change in gene functions.

Assuntos

Regulação da Expressão Gênica de Plantas/genética , Expressão Gênica/genética , Zea mays/genética , Mapeamento Cromossômico/métodos , Evolução Molecular , Duplicação Gênica , Ontologia Genética , Genes de Plantas , Genoma de Planta , Filogenia , Proteínas de Plantas/biossíntese , Proteínas de Plantas/genética , Pólen/genética , Poliploidia

MaizeGDB 2018: the maize multi-genome genetics and genomics database.

Portwood, John L; Woodhouse, Margaret R; Cannon, Ethalinda K; Gardiner, Jack M; Harper, Lisa C; Schaeffer, Mary L; Walsh, Jesse R; Sen, Taner Z; Cho, Kyoung Tak; Schott, David A; Braun, Bremen L; Dietze, Miranda; Dunfee, Brittney; Elsik, Christine G; Manchanda, Nancy; Coe, Ed; Sachs, Marty; Stinard, Philip; Tolbert, Josh; Zimmerman, Shane; Andorf, Carson M.

Nucleic Acids Res ; 47(D1): D1146-D1154, 2019 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-30407532

RESUMO

Since its 2015 update, MaizeGDB, the Maize Genetics and Genomics database, has expanded to support the sequenced genomes of many maize inbred lines in addition to the B73 reference genome assembly. Curation and development efforts have targeted high quality datasets and tools to support maize trait analysis, germplasm analysis, genetic studies, and breeding. MaizeGDB hosts a wide range of data including recent support of new data types including genome metadata, RNA-seq, proteomics, synteny, and large-scale diversity. To improve access and visualization of data types several new tools have been implemented to: access large-scale maize diversity data (SNPversity), download and compare gene expression data (qTeller), visualize pedigree data (Pedigree Viewer), link genes with phenotype images (MaizeDIG), and enable flexible user-specified queries to the MaizeGDB database (MaizeMine). MaizeGDB also continues to be the community hub for maize research, coordinating activities and providing technical support to the maize research community. Here we report the changes MaizeGDB has made within the last three years to keep pace with recent software and research advances, as well as the pan-genomic landscape that cheaper and better sequencing technologies have made possible. MaizeGDB is accessible online at https://www.maizegdb.org.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Genoma de Planta/genética , Genômica/métodos , Zea mays/genética , Regulação da Expressão Gênica de Plantas , Variação Genética , Armazenamento e Recuperação da Informação/métodos , Internet , Polimorfismo de Nucleotídeo Único , Proteômica/métodos , Interface Usuário-Computador , Zea mays/metabolismo

The quality of metabolic pathway resources depends on initial enzymatic function assignments: a case for maize.

Walsh, Jesse R; Schaeffer, Mary L; Zhang, Peifen; Rhee, Seung Y; Dickerson, Julie A; Sen, Taner Z.

BMC Syst Biol ; 10(1): 129, 2016 11 29.

Artigo em Inglês | MEDLINE | ID: mdl-27899149

RESUMO

BACKGROUND: As metabolic pathway resources become more commonly available, researchers have unprecedented access to information about their organism of interest. Despite efforts to ensure consistency between various resources, information content and quality can vary widely. Two maize metabolic pathway resources for the B73 inbred line, CornCyc 4.0 and MaizeCyc 2.2, are based on the same gene model set and were developed using Pathway Tools software. These resources differ in their initial enzymatic function assignments and in the extent of manual curation. We present an in-depth comparison between CornCyc and MaizeCyc to demonstrate the effect of initial computational enzymatic function assignments on the quality and content of metabolic pathway resources. RESULTS: These two resources are different in their content. MaizeCyc contains GO annotations for over 21,000 genes that CornCyc is missing. CornCyc contains on average 1.6 transcripts per gene, while MaizeCyc contains almost no alternate splicing. MaizeCyc also does not match CornCyc's breadth in representing the metabolic domain; MaizeCyc has fewer compounds, reactions, and pathways than CornCyc. CornCyc's computational predictions are more accurate than those in MaizeCyc when compared to experimentally determined function assignments, demonstrating the relative strength of the enzymatic function assignment pipeline used to generate CornCyc. CONCLUSIONS: Our results show that the quality of initial enzymatic function assignments primarily determines the quality of the final metabolic pathway resource. Therefore, biologists should pay close attention to the methods and information sources used to develop a metabolic pathway resource to gauge the utility of using such functional assignments to construct hypotheses for experimental studies.

Assuntos

Biologia Computacional , Zea mays/metabolismo , Anotação de Sequência Molecular , Proteínas de Plantas/metabolismo , Zea mays/enzimologia

A computational platform to maintain and migrate manual functional annotations for BioCyc databases.

Walsh, Jesse R; Sen, Taner Z; Dickerson, Julie A.

BMC Syst Biol ; 8: 115, 2014 Oct 12.

Artigo em Inglês | MEDLINE | ID: mdl-25304126

RESUMO

BACKGROUND: BioCyc databases are an important resource for information on biological pathways and genomic data. Such databases represent the accumulation of biological data, some of which has been manually curated from literature. An essential feature of these databases is the continuing data integration as new knowledge is discovered. As functional annotations are improved, scalable methods are needed for curators to manage annotations without detailed knowledge of the specific design of the BioCyc database. RESULTS: We have developed CycTools, a software tool which allows curators to maintain functional annotations in a model organism database. This tool builds on existing software to improve and simplify annotation data imports of user provided data into BioCyc databases. Additionally, CycTools automatically resolves synonyms and alternate identifiers contained within the database into the appropriate internal identifiers. CONCLUSIONS: Automating steps in the manual data entry process can improve curation efforts for major biological databases. The functionality of CycTools is demonstrated by transferring GO term annotations from MaizeCyc to matching proteins in CornCyc, both maize metabolic pathway databases available at MaizeGDB, and by creating strain specific databases for metabolic engineering.

Assuntos

Biologia Computacional/métodos , Curadoria de Dados/métodos , Bases de Dados como Assunto , Software

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA